Cap decompressed size for deflate/zstd/lz4/packbits to block bomb attacks#1533
Open
brendancol wants to merge 1 commit intoxarray-contrib:mainfrom
Open
Cap decompressed size for deflate/zstd/lz4/packbits to block bomb attacks#1533brendancol wants to merge 1 commit intoxarray-contrib:mainfrom
brendancol wants to merge 1 commit intoxarray-contrib:mainfrom
Conversation
…acks
The four codecs in ``xrspatial/geotiff/_compression.py`` decompressed strip
and tile payloads with no output-size cap, so a small malicious TIFF could
expand to multiple gigabytes during decode and OOM-kill the reader. An
audit confirmed it: a 1 MiB deflate-compressed strip declaring a 1024x1024
image expands to 1 GiB, and a process with ``RLIMIT_AS=300MB`` is killed
before the existing post-decode size check (``_decode_strip_or_tile`` in
``_reader.py:565``) ever runs. The threat model is untrusted TIFF input
from web downloads, fsspec, third-party catalogs, or user upload.
Each codec now accepts an ``expected_size`` kwarg (the byte count the
caller already computed for the post-check) and refuses to emit more than
``expected_size * 1.05 + 1`` bytes before raising ``ValueError`` with the
codec name and actual vs cap. The 1.05x margin allows for legitimate
codec metadata that some encoders emit; bomb ratios (1000:1+) are
rejected long before peak RSS spikes.
Per-codec implementation:
- deflate: ``zlib.decompressobj().decompress(data, max_length=cap+1)``
with a drain loop over ``unconsumed_tail``. ``zlib.decompress`` had no
cap.
- zstd: ``ZstdDecompressor().stream_reader(data).read(cap+1)``, then
probe one more byte to detect overflow. ``decompress(data,
max_output_size=cap)`` is not actually enforced when the frame embeds
the content size, which the bomb does.
- lz4: ``LZ4FrameDecompressor().decompress(data, max_length=cap+1)`` and
treat ``needs_input == False`` as overflow (decoder has buffered output
it could not deliver).
- packbits: pure-Python loop already, so just check the running output
length after each opcode.
The dispatch ``decompress`` plumbs ``expected_size`` through to all four.
Tests in ``xrspatial/geotiff/tests/test_decompression_caps.py`` cover:
codec-direct bomb rejection, end-to-end TIFF bomb rejection (1 MiB
declared, 1 GiB decoded), legitimate high-ratio compression (all-zero
arrays at >50:1) passing without false rejection, and the 5% metadata
margin not over-rejecting.
Audit reproducer behaviour: the 1 MiB-to-1 GiB TIFF previously triggered
``MemoryError`` (or OS OOM kill) in zlib.decompress; now raises
``ValueError("deflate decode exceeded expected size: ...")`` with peak
RSS bounded by the cap.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four codecs in
xrspatial/geotiff/_compression.pydecompressed strip and tile payloads with no output-size cap. A small malicious TIFF (web download, fsspec, third-party catalog, user upload) could declare a 1024x1024 uint8 image (1 MiB expected) and supply a 1 MiB deflate-compressed strip that decodes to 1 GiB, OOM-killing the reader. The existing size check at_decode_strip_or_tile(_reader.py:565) runs after decompression, so it cannot bound peak RSS. Audit confirmed by settingRLIMIT_AS=300MBand feeding a 1 MiB deflate-bombed TIFF toread_to_array: process is killed.This PR adds an
expected_sizecap inside each codec, rejecting any decode that would exceedexpected_size * 1.05 + 1bytes with a cleanValueErrorbefore peak allocation.Codecs covered (one fix template, four codecs)
deflate_decompress):zlib.decompressobj().decompress(data, max_length=cap+1)with a drain loop.zlib.decompresshas no cap parameter.zstd_decompress):ZstdDecompressor().stream_reader(data).read(cap+1)plus a one-byte overflow probe.decompress(data, max_output_size=cap)is not enforced when the frame embeds the content size, which is exactly the case in the bomb.lz4_decompress):LZ4FrameDecompressor().decompress(data, max_length=cap+1)and treatneeds_input == Falseas overflow.packbits_decompress): pure-Python loop already; check the running output length after each opcode.Margin
The 1.05x margin (5%) lets legitimate codec framing or trailing metadata bytes through while still rejecting the 1000:1 ratios characteristic of bomb attacks. A test (
test_cap_includes_metadata_margin) covers the legitimate metadata case, andtest_legitimate_high_compression_passesconfirms an all-zero array at >50:1 ratio is accepted.Tests
xrspatial/geotiff/tests/test_decompression_caps.py(14 tests):The end-to-end TIFFs are hand-built with
struct.packso no test ever materializes the 1 GiB payload outside the codec's own buffer.Out of scope
Per project policy on one-fix-per-PR for security work, the user explicitly opted to bundle these four codec caps into one PR (same vulnerability class, same fix template). Other audit findings (IFD count, IFD chain) ship in separate PRs.
Test plan
pytest xrspatial/geotiff/tests/test_decompression_caps.py -x -q(14 pass)pytest xrspatial/geotiff/tests/test_compression.py xrspatial/geotiff/tests/test_compression_level.py xrspatial/geotiff/tests/test_lz4.py xrspatial/geotiff/tests/test_reader.py xrspatial/geotiff/tests/test_writer.py -x -q(63 pass, no regressions)